1.5D Parallel Sparse Matrix-Vector Multiply
نویسندگان
چکیده
There are three common parallel sparse matrix-vector multiply algorithms: 1D row-parallel, 1D column-parallel and 2D row-column-parallel. The 1D parallel algorithms offer the advantage of having only one communication phase. On the other hand, the 2D parallel algorithm is more scalable due to a high level of flexibility on distributing fine-grain tasks, whereas they suffer from two communication phases. Here, we introduce a novel concept of heterogeneous messages where a heterogeneous message may contain both input-vector entries and partially computed output-vector entries. This concept not only leads to a decreased number of messages but also enables fusing the inputand output-communication phases into a single phase. These findings are utilized to propose a 1.5D parallel sparse matrix-vector multiply algorithm which is called local row-column-parallel. This proposed algorithm requires local fine-grain partitioning where locality refers to the constraint on each fine-grain task being assigned to the processor that contains either its input-vector entry, or its output-vector entry, or both. This constraint, nevertheless, happens to be not very restrictive so that we achieve a partitioning quality close to that of the 2D parallel algorithm. We propose two methods for local fine-grain partitioning. The first method is based on a novel directed hypergraph partitioning model that minimizes total communication volume while maintaining a load balance constraint as well as an additional locality constraint which is handled by adopting and adapting a recent and simple yet effective approach. The second method has two parts where the first part finds a distribution of the inputand output-vectors and the second part finds a nonzero/task distribution that exactly minimizes total communication volume while keeping the vector distribution intact. We conduct our experiments on a large set of test matrices to evaluate the partitioning qualities and partitioning times of these proposed 1.5D methods.
منابع مشابه
An Improved Sparse Matrix-Vector Multiply Based on Recursive Sparse Blocks Layout
The Recursive Sparse Blocks (RSB) is a sparse matrix layout designed for coarse grained parallelism and reduced cache misses when operating with matrices, which are larger than a computer’s cache. By laying out the matrix in sparse, non overlapping blocks, we allow for the shared memory parallel execution of transposed SParse Matrix-Vector multiply (SpMV ), with higher efficiency than the tradi...
متن کاملA Library for Parallel Sparse Matrix Vector Multiplies
We provide parallel matrix-vector multiply routines for 1D and 2D partitioned sparse square and rectangular matrices. We clearly give pseudocodes that perform necessary initializations for parallel execution. We show how to maximize overlapping between communication and computation through the proper usage of compressed sparse row and compressed sparse column formats of the sparse matrices. We ...
متن کاملRevisiting Hypergraph Models for Sparse Matrix Partitioning
We provide an exposition of hypergraph models for parallelizing sparse matrix-vector multiplies. Our aim is to emphasize the expressive power of hypergraph models. First, we set forth an elementary hypergraph model for parallel matrix-vector multiply based on one-dimensional (1D) matrix partitioning. In the elementary model, the vertices represent the data of a matrix-vector multiply, and the n...
متن کاملDistributed Disk-based Solution Techniques for Large Markov Models
Very large Markov chains often arise from stochastic models of complex real-life systems. In this paper we investigate disk-based techniques for solving such Markov chains on a distributed-memory parallel computer. We focus on two scalable numerical methods, namely the Jacobi and Conjugate Gradient Squared (CGS) algorithms. The critical bottleneck in these methods is the parallel sparse matrix-...
متن کاملEfficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format
In earlier work we have introduced the “Recursive Sparse Blocks” (RSB) sparse matrix storage scheme oriented towards cache efficient matrix-vector multiplication (SpMV ) and triangular solution (SpSV ) on cache based shared memory parallel computers. Both the transposed (SpMV T ) and symmetric (SymSpMV ) matrix-vector multiply variants are supported. RSB stands for a meta-format: it recursively...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- SIAM J. Scientific Computing
دوره 40 شماره
صفحات -
تاریخ انتشار 2018